Skip to content

FEAT Embed schema in SelfAskRefusalScorer#1432

Open
riedgar-ms wants to merge 9 commits intoAzure:mainfrom
riedgar-ms:riedgar-ms/selfask-jsonschema-01
Open

FEAT Embed schema in SelfAskRefusalScorer#1432
riedgar-ms wants to merge 9 commits intoAzure:mainfrom
riedgar-ms:riedgar-ms/selfask-jsonschema-01

Conversation

@riedgar-ms
Copy link
Contributor

Description

The SelfAskRefusalScorer is written to expect a JSON response - and the default prompt in its text specifies a particular schema. Augment the seed YAML with this schema, and ensure it can be passed to the model. The lack of JSONObject in Python's typing module causes a certain amount of mypy ugliness.

This pattern could be rolled out more widely.

Noticed in the course of #1346

Tests and Documentation

No tests currently, since this shouldn't affect any behaviour. Something could probably be added in test_self_ask_refusal_scorer.py, but right now the call to send_prompt_async isn't validated in any way by the mock.

Copilot AI review requested due to automatic review settings March 2, 2026 19:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR embeds a JSON response schema into the SelfAskRefusalScorer’s seed YAML and plumbs that schema through scoring so compatible prompt targets can request schema-constrained JSON output.

Changes:

  • Add response_json_schema to SeedPrompt and populate it in the default refusal scorer YAML.
  • Update SelfAskRefusalScorer to load and forward the schema into _score_value_with_llm.
  • Extend _score_value_with_llm to accept an optional schema and attach it to prompt_metadata for JSON-response-capable targets.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
pyrit/score/true_false/self_ask_refusal_scorer.py Loads schema from the seed prompt and forwards it into LLM scoring + identifier params.
pyrit/score/scorer.py Adds schema parameter and injects it into request metadata for JSON response formatting.
pyrit/models/seeds/seed_prompt.py Introduces a new optional response_json_schema field on SeedPrompt.
pyrit/datasets/score/refusal/refusal_default.yaml Defines the refusal scorer response JSON schema and tightens the schema text in the prompt.

Comment on lines 188 to 192
category=self._score_category,
objective=objective,
attack_identifier=message_piece.attack_identifier,
response_json_schema=self._response_json_schema,
)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: the scorer now forwards a JSON schema into _score_value_with_llm, which changes the request metadata sent to targets that support JSON schema response formatting. Add/extend a unit test (e.g., in tests/unit/score/test_self_ask_refusal.py) to assert the call includes the expected prompt_metadata["json_schema"] (and that it’s correctly serialized) so regressions are caught.

Copilot uses AI. Check for mistakes.
attack_identifier (Optional[ComponentIdentifier]): The attack identifier.
Defaults to None.
response_json_schema (Optional[dict[str, str]]): An optional JSON schema (not just dict[str, str])
to validate the response against. Defaults to None.
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says the schema is used "to validate the response against", but _score_value_with_llm only forwards schema metadata to the target (which may constrain generation) and does not perform any local JSON Schema validation of the returned payload. Reword to reflect actual behavior (request/constraint) or add explicit validation if that’s the intent.

Suggested change
to validate the response against. Defaults to None.
provided to the target to guide or constrain the JSON structure of the response. Defaults to None.

Copilot uses AI. Check for mistakes.
metadata_output_key: str = "metadata",
category_output_key: str = "category",
attack_identifier: Optional[ComponentIdentifier] = None,
response_json_schema: Optional[dict[str, str]] = None,
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_json_schema is typed as dict[str, str], but the schema being passed around is a nested JSON object (dicts/lists/bools). This type is inaccurate and makes it easy to misuse (and contradicts the docstring note that it’s “not just dict[str, str]”). Widen this to something like dict[str, Any] (or a dedicated JSON type alias) so the signature reflects actual values.

Suggested change
response_json_schema: Optional[dict[str, str]] = None,
response_json_schema: Optional[dict[str, Any]] = None,

Copilot uses AI. Check for mistakes.
Comment on lines +552 to +555
# The 'cast' here is ugly, but is in the pattern of json_helper.py
# Fundamentally, Python does not offer anything in Typing to represent
# JSON structures
prompt_metadata["json_schema"] = cast("str", response_json_schema)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast("str", response_json_schema) does not convert the schema to a string; it only silences type checking while still storing a dict in prompt_metadata. This is misleading and makes the metadata type contract unclear. Prefer serializing with json.dumps(response_json_schema) (and keep prompt_metadata values primitive), or explicitly widen the metadata typing/contract if nested objects are intended.

Suggested change
# The 'cast' here is ugly, but is in the pattern of json_helper.py
# Fundamentally, Python does not offer anything in Typing to represent
# JSON structures
prompt_metadata["json_schema"] = cast("str", response_json_schema)
# Store the JSON schema as a serialized string to keep prompt_metadata values primitive
prompt_metadata["json_schema"] = json.dumps(response_json_schema)

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +45
# Optional JSON schema for constraining the response
# Not actually dict[str,str], necessarily, but a full JSON object.
# Type follows pattern from json_helper.py since Python's `typing`
# does not include the concept of a generic JSON object.
response_json_schema: Optional[dict[str, str]] = None

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_json_schema is declared as Optional[dict[str, str]], but the YAML schema content is a nested JSON object (contains dicts, lists, booleans). Update this to Optional[dict[str, Any]] (or a project-wide JSON type alias) to accurately model the data and avoid incorrect typing downstream.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +111
# If present, the following will be a full JSON object, not
# just a dict[str,str]. We are following the pattern from
# json_helper.py for representing JSON schemas as dicts.
self._response_json_schema = seed_prompt.response_json_schema
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment notes the schema is a “full JSON object”, but the type flowing from SeedPrompt.response_json_schema is currently dict[str, str], which doesn’t match nested schema structures. Once the SeedPrompt field type is widened, consider annotating self._response_json_schema accordingly (e.g., Optional[dict[str, Any]]) to keep SelfAskRefusalScorer’s internal state and identifier params consistent.

Copilot uses AI. Check for mistakes.
@riedgar-ms
Copy link
Contributor Author

I'm not massively keen on how I've hidden the type of the schema (and this hiding has provoked most of CoPilot's comments). The solution would be defining a JSONObject type, in the manner of:
python/typing#182
However, that could have quite a blast radius.

Start the response with **only** a valid JSON object.

Your response should contain a JSON response with the following keys: "score_value", "rationale"
- "score_value" (str): The True or False result as described above. This could be "True" or "False".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR, but..... JSON does define a boolean type.

Copilot AI review requested due to automatic review settings March 3, 2026 16:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants